30 research outputs found

    Plagiarism Detection in arXiv

    Full text link
    We describe a large-scale application of methods for finding plagiarism in research document collections. The methods are applied to a collection of 284,834 documents collected by arXiv.org over a 14 year period, covering a few different research disciplines. The methodology efficiently detects a variety of problematic author behaviors, and heuristics are developed to reduce the number of false positives. The methods are also efficient enough to implement as a real-time submission screen for a collection many times larger.Comment: Sixth International Conference on Data Mining (ICDM'06), Dec 200

    Athletes’ Relationships with Training Scale (ART)

    Get PDF
    The Athletes’ Relationships with Training Scale (ART)* is a self-report measure of unhealthy training behaviors and beliefs in athletes. The ART was designed for use by clinicians and athletic trainers to help identify athletes who are engaging in unhealthy training practices which could be associated with an eating disorder. The ART may also be helpful for tracking clinical outcomes in athletes with eating disorders who are receiving treatment. This record contains the 15-item ART as well as scoring instructions and guidelines for interpreting total scores

    Modeling Additive Structure and Detecting Interactions with Groves of Trees

    Full text link
    Discovery of additive structure is an important step towards understanding a complex multi-dimensional function, because it allows for expressing this function as the sum of lower-dimensional or otherwise simpler components. Modeling additive structure also opens up opportunities for learning better regression models. The term statistical interaction is used to describe the presence of non-additive effects among two or more variables in a function. When variables interact, their effects must be modeled and interpreted simultaneously. Thus, detecting statistical interactions can be critical for an understanding of processes by domain researchers. This dissertation analyzes benefits of modelling additive structure for prediction and interaction detection problems. It describes a new learning algorithm called Groves, which is an ensemble of additive regression trees. Groves is based on such existing techniques as bagging and additive models; their combination allows us to use large trees in the ensemble and at the same time model additive structure of the response function. Regression version of the algorithm, Additive Groves, and its classification counterpart, Gradient Groves, yield consistently high performance across a variety of problems, outperforming on average a large number of other algorithms. Additive nature of Groves makes it particularly useful for interaction detection. This dissertation introduces a new approach to interaction detection: it is based on comparing the performance of restricted and unrestricted predictive models. Groves of trees allow variable interactions to be carefully controlled and therefore are especially useful for this framework. The details of proposed practical approach to interaction detection analysis are demonstrated on real data describing the abundance of different species of birds in the prairies east of the southern Rocky Mountains
    corecore